One of our students taking our Kernel Debugging class recently brought in an excellent crash dump that demonstrates what we suspect is a multi-processor race condition in Windows NT 4.0. In this article we will demonstrate our analysis of the crash dump with an eye towards assisting our readers in analyzing their own crash dumps.
The key to any analysis is, of course, ensuring that you are using the right tools for the job. In analyzing this crash dump we used both WinDBG (Build 2127.1 – the version provided with the Windows 2000 RC2 DDK) and i386kd (again, the version from the Windows 2000 RC2 DDK). While we normally use WinDBG, because of what appear to be some temporary development issues we had to also use i386kd. As of the publication of this article, the current version of WinDBG (from the final release) is considerably more stable and we are expecting to see a new version to be demonstrated at WinHEC 2000.
Background
The crash dump was obtained from a quad processor system running Windows NT 4.0 SP5 (yes, you can use the Windows 2000 tools to debug on Windows NT 4.0) while running the latest version of the Microsoft Hardware Compatibility Tests (HCTs). The hardware platform worked with previous versions of the HCTs, but the latest versions exhibited a mysterious PAGE_FAULT_IN_NONPAGED_AREA error while running the tests.
Of course, the initial concern was that this was uncovering some hardware bug (which is always a possibility when running the HCT tests). Then, the HCT tests came under fire. When we started looking at it, we noted that the system crashed inside of the Windows NT operating system – no drivers were involved at all. Thus, we argued that it could not be the fault of the HCTs, because an application should never be able to crash the operating system.
Analyzing the Stop Code
A stop code of 0x50 (PAGE_FAULT_IN_NONPAGED _AREA) is actually one of the most common stop codes that we observe with Windows NT systems, and thus we’re quite familiar with analyzing it. This stop code occurs as a result of a page fault against an address within the system address space (normally between 0x80000000 and 0xFFFFFFFF) when that range of addresses does not, in fact, support paging (Most system addresses do not. Paged pool, of course does support paging and a page fault for a paged pool address would cause the system to retrieve the page from the paging file – assuming it is in the paging file).
Thus, this stop code occurs because some part of the system has accessed an invalid memory location, whether because it used an uninitialized data pointer, or a pointer to memory that has since been freed. Regardless, the Memory Manager considers this to be a critical error and it halts the system.
In this case, the four parameters tell us quite a lot about why the system crashed. The first parameter indicates the virtual address that was being accessed and the second parameter indicates whether or not the address was being read (a value of zero) or written (a value of one). The meaning of the other two parameters isn’t generally useful in Windows NT 4.0 (we note that these values have changed meanings in Windows 2000, providing better information to simplify debugging).
In this case, the stop code was:
STOP: PAGE_FAULT_IN_NONPAGED_AREA (af3defc4, 0, 0, 0)
Thus, an attempt to access address af3defc4 failed. The address is “allowed” but isn’t one we normally see in use, but that’s probably because this system had 1GB of physical memory, which is also unusual (it yielded a large crash dump file, though!). Thus, this is most likely some sort of programming bug – whether because something is using an uninitialized memory location or it is using a block of memory that has recently been freed, or possibly some other programming problem.
Analyzing the Stack
Once we’ve determined that this is probably some programming bug, we start by looking at the stack that declared the halt. On a multi-processor system this isn’t simple, since the halt might not have occurred on CPU 0 – but of course the debuggers will start using the CPU 0 information as the default.
Thus, we normally start by looking at the stack for each processor in an attempt to identify which processor called KeBugCheckEx. In this case, we obtained the information shown in Figure 1.
0: kd> kv
cannot get version packet on a crash dumpcannot get version packet on a crash dumpChildEBP RetAddr Args to Child
f766ce14 80003e47 80153f7c 00000000 00000000 ntkrnlmp!KeWaitForSingleObject+0x9a(FPO: [Non-Fpo]
f766ce34 8019ace2 7ffde000 77fa5560 00000000 halmps!ExAcquireFastMutex+0x2b (FPO: [0,2,0])
f766ce4c 8019aba1 00000001 b980ae08 b980ae58 ntkrnlmp!PspExitProcess+0x8c(FPO: [Non-Fpo]
f766ced0 8019a53c 00000000 f766cf04 0006fea4 ntkrnlmp!PspExitThread+0x447(FPO: [Non-Fpo]
f766cef4 80140da9 ffffffff 00000000 00000000 ntkrnlmp!NtTerminateProcess+0x13c(FPO: [Non-Fpo]
f766cef4 77f681ff ffffffff 00000000 00000000 ntkrnlmp!KiSystemService+0xc9 (FPO: [0,0] TrapFrame @ f766cf04)
f766cdf4 80153f70 b980aea4 00000000 00000000 0x77f681ff [Stdcall: 257]
0006ff5c 00000000 00000000 00000000 00000000 ntkrnlmp!PspActiveProcessMutex(FPO: [Non-Fpo]
0: kd> ~1
1: kd> kv
dumpChildEBP RetAddr Args to Child
f7b9ab24 80143e8f 00000000 af3defc4 00000000 ntkrnlmp!MmAccessFault+0x29a(FPO: [Non-Fpo]
f7b9ab24 8015c925 00000000 af3defc4 00000000 ntkrnlmp!KiTrap0E+0xc7 (FPO: [0,0] TrapFrame @ f7b9ab3c)
f7b9abb8 8015481a f7abeca0 b980ae08 00010000 ntkrnlmp!ExpCopyProcessInfo+0x11 (FPO: [2,0,3])
f7b9ac38 8015b811 00b40000 00010000 f7b9aec8 ntkrnlmp!ExpGetProcessInformation+0x156(FPO: [Non-Fpo]
f7b9aeec 80140da9 00000005 00b40000 00010000 ntkrnlmp!NtQuerySystemInformation+0x725(FPO: [Non-Fpo]
f7b9aeec 77f67e27 00000005 00b40000 00010000 ntkrnlmp!KiSystemService+0xc9 (FPO: [0,0] TrapFrame @ f7b9af04)
f7b9abac b2ec4ff0 f7abeca0 b2ec4e58 8015481a 0x77f67e27 [Stdcall: 257]
00b3fabc 00000000 00000000 00000000 00000000 0xffffffff`b2ec4ff0 [Stdcall: 257]
Figure 1 — Check Out Each Processor for Call to KeBugCheckEx
From this, then, we couldn’t actually tell which CPU had caused the halt (although we suspected it was CPU 1 – where the page fault occurred). Thus, we turned to the OEM Support Tools KD extension to give us a bit more stack information. We found that the stack for CPU 1 had called KeBugCheckEx (shown in Figure 2).
> !b.stack
T. Address RetAddr Called Procedure
*1 F7B9AAD0 8012E67A _KeBugCheckEx@20(00000050, AF3DEFC4, 00000000,...);
*0 F7B9AAFC 80118AE8 @KiFlushSingleTb@8(F7B9AB38, 801450C1, 80118AE8,...);
*0 F7B9AB04 801450C1 @FxsrSwapContextNotify@8(80118AE8, 80118AE8, 8011BB44,...);
*0 F7B9AB08 80118AE8 @KiFlushSingleTb@8(80118AE8, 8011BB44, 00000000,...);
*0 F7B9AB0C 80118AE8 @KiFlushSingleTb@8(8011BB44, 00000000, BC442E08,...);
*0 F7B9AB10 8011BB44 dword ptr EAX(00000000, BC442E08, FFFFF000,...);
*1 F7B9AB28 80143E8F _MmAccessFault@16(00000000, AF3DEFC4, 00000000,...);
*1 F7B9AB40 800031DA _KiIpiServiceRoutine@8(F7B9AB54, 800031E0, 0001001C,...);
*0 F7B9AB48 800031E0 _HalEndSystemInterrupt@8(0001001C, 000000E1, 00000010,...);
*0 F7B9AB64 80120DEB _MmMapLockedPagesSpecifyCache@24(00006C8E, 00000000, AC900023,...);
*1 F7B9ABBC 8015481A _ExpCopyProcessInfo@8(F7ABECA0, B980AE08, 00010000,...);
*1 F7B9AC3C 8015B811 _ExpGetProcessInformation@12(00B40000, 00010000, F7B9AEC8,...);
*0 F7B9AC58 F7C363A8 _NbtDereferenceDevice@4(B70D2E78, 80E6964C, 80E69528,...);
*1 F7B9AC74 801128AF dword ptr [ECX+EAX*4+38](B70D2E78, 80E69528, 0000004A,...);
*1 F7B9AC88 F7B49BBB @IofCallDriver@8(F7B9000E, 80E01279, 80E69400,...);
*1 F7B9ACAC 8012DF3E @KfReleaseSpinLock@8(F7B9ACDC, ABC8C008, C02AF230,...);
*1 F7B9ACC0 8012D140 @MiChargeCommitmentCantExpand@8(BCA7EFBC, 80150F30, 00000100,...);
*1 F7B9ACE0 8010A8BC _MmAllocateSpecialPool@12(00000100, 7366704E, 00000000,...);
*1 F7B9AD10 801134E1 @KfReleaseSpinLock@8(EBC40937, 00000000, EBC40938,...);
*1 F7B9AD14 EBC40937 _IoReleaseCancelSpinLock@4(00000000, EBC40938, A5332F00,...);
*1 F7B9AD5C EBC4624B _NpAddDataQueueEntry@24(801096D9, F7B9ADC0, A5332F00,...);
*0 F7B9AD60 801096D9 @KfReleaseSpinLock@8(F7B9ADC0, A5332F00, F7B9ADE8,...);
*0 F7B9ADA8 8012DDE8 @KfReleaseSpinLock@8(00000000, A8E9EFFC, C4000010,...);
*0 F7B9ADD0 8012DC15 @KfReleaseSpinLock@8(F7B9AE34, F7B9AE34, 00000000,...);
*1 F7B9ADE8 80131164 @MiInsertNode@8(00B4FFFF, 00B40000, C4000010,...);
*1 F7B9AE38 80181E56 _MiInsertVad@4(80181E9B, F7B9AF04, 00B3FA3C,...);
*0 F7B9AE3C 80181E9B @ExReleaseFastMutex@4(F7B9AF04, 00B3FA3C, 801813DE,...);
*0 F7B9AE84 80139804 @KfReleaseSpinLock@8(00000004, BC8EEFD4, 00010000,...);
*1 F7B9AEF0 80140DA9 dword ptr EBX(00000005, 00B40000, 00010000,...);
Figure 2 — Stack for CPU-1 Using KD Extension
Note that we can observe the KeBugCheckEx call and for this function, if it is present on the stack, even in a “ghost” stack frame, it must have been called. After all, this is not a function that returns to the caller!
Just a side note: if you aren’t using the OEM Support Tools package in your debugging, you’ve left a very powerful tool out of your toolbox. We’ve been using it for a few years now (version 3.0 was released in March) and around OSR we swear by it. There is a version (Version 2.0) included in the Windows 2000 final release, but the newer version (V3) is available from Microsoft’s website (V3 with symbols currently at http://download.microsoft.com/download/win2000srv/Utility/3.0/NT45/EN-US/oem3sr0s.zip...of course this will change).
Thus, we have the page fault that actually triggered the operation that caused the termination of the system. Note the KiTrap0E on the stack – that is the page fault handler function within the kernel, because Trap 14 (0x0E) is the page fault on the IA32 CPU occurred. It occurred in the ExpCopyProcessInfo function.
This function, in turn, was invoked from function ExpGetProcessInformation. Unfortunately, we don’t have any source of information about the function ExpCopyProcessInfo (or ExpGetProcessInformation for that matter) although there is some (non-Microsoft and hence of suspect quality) information about the function NtQuerySystemInformation. However, based upon the name of ExpCopyProcessInfo we can guess that it is attempting to copy process data from an EPROCESS structure into a buffer. Thus, we probed the arguments to determine if one was in fact an EPROCESS structure. It turns out that the second parameter was, in fact an EPROCESS (Figure 3).
1: kd> !process b980ae08
!process b980ae08
PROCESS b980ae08 Cid: 0120 Peb: 7ffdf000 ParentCid: 007c
DirBase: 08c6f000 ObjectTable: 00000000 TableSize: 0.
Image: cgiapp.exe
VadRoot a856cfc8 Clone 0 Private 30. Modified 0. Locked 0.
B980AFC4 MutantState Signalled OwningThread 0
Process Lock Owned by Thread bf6b6dc0
Token b0834eb0
ElapsedTime 0:00:00.0500
UserTime 0:00:00.0015
KernelTime 0:00:00.0015
QuotaPoolUsage[PagedPool] 3713
QuotaPoolUsage[NonPagedPool] 832
Working Set Sizes (now,min,max) (145, 50, 345) (580KB, 200KB, 1380KB)
PeakWorkingSetSize 167
VirtualSize 4 Mb
PeakVirtualSize 9 Mb
PageFaultCount 164
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 36
THREAD bf6b6dc0 Cid 120.78 Teb: 00000000 Win32Thread: 00000000 RUNNING
Not impersonating
Owning Process b980ae08
WaitTime (seconds) 338550
Context Switch Count 53
UserTime 0:00:00.0000
KernelTime 0:00:00.0015
Start Address 0x77f0528c
Win32 Start Address 0x01001150
Stack Init f766d000 Current f766cc80 Base f766d000 Limit f766a000 Call 0
Priority 16 BasePriority 8 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
f766ce14 80003e47 80153f7c 00000000 00000000 ntkrnlmp!KeWaitForSingleObject+0x9a
f766ce34 8019ace2 7ffde000 77fa5560 00000000 halmps!ExAcquireFastMutex+0x2b
f766ce4c 8019aba1 00000001 b980ae08 b980ae58 ntkrnlmp!PspExitProcess+0x8c
f766ced0 8019a53c 00000000 f766cf04 0006fea4 ntkrnlmp!PspExitThread+0x447
f766cef4 80140da9 ffffffff 00000000 00000000 ntkrnlmp!NtTerminateProcess+0x13c
f766cef4 77f681ff ffffffff 00000000 00000000 ntkrnlmp!KiSystemService+0xc9
f766cdf4 80153f70 b980aea4 00000000 00000000 +0x77f681ff
0006ff5c 00000000 00000000 00000000 00000000 ntkrnlmp!PspActiveProcessMutex
Figure 3 — Stack of Process Exiting...Suspicions Raised
The stack trace in Figure 3 was most interesting because the process is exiting. From this we began to suspect that we might be observing an interesting bug – one process is gathering information about a second process, and the second process is terminating. The two threads are running simultaneously – one on CPU 0, the other on CPU1.
The thread on CPU0 (the thread in the terminating process) is entering a wait condition. It has not yet dispatched (so it is still running) but it has encountered an owned mutex and is going to wait for that mutex (we can determine this because of the call to ExAcquireFastMutex).
Alas, this does not conclusively demonstrate a bug, but it certainly raised our suspicions. We decided it was time to turn our attention to the faulting thread – running on CPU 1.
Interpreting the Trap Frame
We decided to track back through the code for the faulting thread. To accomplish this we used the trap frame information on the stack. In this case, it was simple to find because the debugger detected and reported the location of the trap frame to us. Had the debugger not told us where the trap frame was located, we would have looked for it (manually) on the stack. On the IA32 platform running Windows NT the values of the DS and ES segment registers contain the value 0x23 and thus we can identify the location of the trap frame by looking for these values (the DS segment register is stored 0x34 bytes from the beginning of the trap frame). This technique is actually described by Microsoft in Knowledge Base Article # Q159672.
The trap frame tells us what the values of the registers were at the time of the page fault. From this information, we can then work backwards to try figuring out what the code was actually doing at the time the system crashed. In this case the function we need to analyze had just been called – and this makes it easy for us to figure out what was going on. Thus, using the debugger, we generated a listing of the assembly code for this function (shown in Figure 4).
> !trap f7b9ab3c
eax=af3defb0 ebx=b2ec4e58 ecx=00005d28 edx=00000481 esi=f7abeca0 edi=b980ae08
eip=8015c925 esp=f7b9abb0 ebp=f7b9ac38 iopl=0 nv up ei ng nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010282
ErrCode = 00000000
8015C925 8B4014 mov eax,dword ptr [eax+14h]
Figure 4 — Let’s Check Out the Assembly Code!
The trap occurred at address 0xf7b9ab3c while attempting to access 0xaf3defb0 (this is the value in the EAX register in this instance) Thus, the instruction:
8015C925 8B4014 mov eax,dword ptr [eax+14h]
is attempting to retrieve some value in memory. Working backwards from this, we try to determine where this code segment came up with this particular value. Figure 5 shows the disassembly from the beginning of the current function (ExpCopyProcessInfo).
> u 8015c914
NT!_ExpCopyProcessInfo@8+0x0:
8015C914 53 push ebx
8015C915 56 push esi
8015C916 57 push edi
8015C917 8B7C2414 mov edi,dword ptr [esp+14h]
8015C91B 8B8704010000 mov eax,dword ptr [edi+104h]
8015C921 85C0 test eax,eax
8015C923 740C je _ExpCopyProcessInfo@8+1Dh
8015C925 8B4014 mov eax,dword ptr [eax+14h]
Figure 5 — One Step Backwards...Disassembly from ExpCopyProcessInfo
Normally, when presented with a crash such as this one, we will attempt to work backwards from the current register values, following the trail of information back to see if we can determine what the problem was.
In this case, the contents of the EAX register came by using the address 0x104 bytes from the address contained in the EDI register. This would most likely be a dereference of some field within a data structure. That “data structure address” in turn was extracted from the stack (the stack pointer is ESP) – notably 0x14 bytes from the current stack pointer. Since the previous three instructions pushed three values onto the stack and the function return address is also stored there, we note that this looks to be referencing parameter two (with parameter 1 at 0x10 from the current stack pointer). Oh, don’t forget that stacks grow down so arguments above the current stack address (at a positive offset) are values on the stack.
Since we noted earlier that parameter two is the EPROCESS, we believe this is consistent – that we are attempting to load some information from the EPROCESS structure. Thus, our next question becomes: what is located in the EPROCESS at offset 0x104. We use the “!strct” command (from kdex2x86) to display the format of the EPROCESS structure (See Figure 6).
> !strct eprocess
Structure EPROCESS (Size:0x1f8) member offsets:
+0000 Pcb(KPROCESS struct)
+0000 Header(DISPATCHER_HEADER struct)
+0010 ProfileListHead(LIST_ENTRY struct)
+0018 DirectoryTableBase
+0020 LdtDescriptor(KGDTENTRY struct)
+0028 Int21Descriptor(KIDTENTRY struct)
+0030 IopmOffset
+0032 Iopl
+0033 VdmFlag
+0034 ActiveProcessors
+0038 KernelTime
+003c UserTime
+0040 ReadyListHead(LIST_ENTRY struct)
+0048 SwapListEntry(LIST_ENTRY struct)
+0050 ThreadListHead(LIST_ENTRY struct)
+0058 ProcessLock
+005c Affinity
+0060 StackCount
+0062 BasePriority
+0063 ThreadQuantum
+0064 AutoAlignment
+0065 State
+0066 ThreadSeed
+0067 DisableBoost
+0068 ExitStatus
+006c LockEvent(KEVENT struct)
+006c Header(DISPATCHER_HEADER struct)
+007c LockCount
+0080 CreateTime
+0088 ExitTime
+0090 LockOwner
+0094 UniqueProcessId
+0098 ActiveProcessLinks(LIST_ENTRY struct)
+0098 Flink
+009c Blink
+00a0 QuotaPeakPoolUsage
+00a8 QuotaPoolUsage
+00b0 PagefileUsage
+00b4 CommitCharge
+00b8 PeakPagefileUsage
+00bc PeakVirtualSize
+00c0 VirtualSize
+00c8 Vm(MMSUPPORT struct)
+00c8 LastTrimTime
+00d0 LastTrimFaultCount
+00d4 PageFaultCount
+00d8 PeakWorkingSetSize
+00dc WorkingSetSize
+00e0 MinimumWorkingSetSize
+00e4 MaximumWorkingSetSize
+00e8 VmWorkingSetList
+00ec WorkingSetExpansionLinks(LIST_ENTRY struct)
+00f4 AllowWorkingSetAdjustment
+00f5 AddressSpaceBeingDeleted
+00f6 ForegroundSwitchCount
+00f7 MemoryPriority
+00f8 LastProtoPteFault
+00fc DebugPort
+0100 ExceptionPort
+0104 ObjectTable
+0108 Token
+010c WorkingSetLock(FAST_MUTEX struct)
+010c Count
+0110 Owner
+0114 Contention
+0118 Event(KEVENT struct)
+0128 OldIrql
+012c WorkingSetPage
+0130 ProcessOutswapEnabled
+0131 ProcessOutswapped
+0132 AddressSpaceInitialized
+0133 AddressSpaceDeleted
+0134 AddressCreationLock(FAST_MUTEX struct)
+0134 Count
+0138 Owner
+013c Contention
+0140 Event(KEVENT struct)
+0150 OldIrql
+0154 HyperSpaceLock
+0158 ForkInProgress
+015c VmOperation
+015e ForkWasSuccessful
+015f MmAgressiveWsTrimMask
+0160 VmOperationEvent
+0164 PageDirectoryPte(HARDWARE_PTE struct)
+0164 Valid
+0164 Write
+0164 Owner
+0164 WriteThrough
+0164 CacheDisable
+0164 Accessed
+0164 Dirty
+0164 LargePage
+0164 Global
+0164 CopyOnWrite
+0164 Prototype
+0164 reserved
+0164 PageFrameNumber
+0168 LastFaultCount
+016c ModifiedPageCount
+0170 VadRoot
+0174 VadHint
+0178 CloneRoot
+017c NumberOfPrivatePages
+0180 NumberOfLockedPages
+0184 NextPageColor
+0186 ExitProcessCalled
+0187 CreateProcessReported
+0188 SectionHandle
+018c Peb
+0190 SectionBaseAddress
+0194 QuotaBlock
+0198 LastThreadExitStatus
+019c WorkingSetWatch
+01a0 Win32WindowStation
+01a4 InheritedFromUniqueProcessId
+01a8 GrantedAccess
+01ac DefaultHardErrorProcessing
+01b0 LdtInformation
+01b4 VadFreeHint
+01b8 VdmObjects
+01bc ProcessMutant(KMUTANT struct)
+01bc Header(DISPATCHER_HEADER struct)
+01cc MutantListEntry(LIST_ENTRY struct)
+01d4 OwnerThread
+01d8 Abandoned
+01d9 ApcDisable
+01dc ImageFileName
+01ec VmTrimFaultValue
+01f0 SetTimerResolution
+01f1 PriorityClass
+01f2 SubSystemMinorVersion
+01f3 SubSystemMajorVersion
+01f2 SubSystemVersion
+01f4 Win32Process
> * esp+14 looks like Param2
> * eax is (esp+14)->(104)
> * Test for null
> * eax = *(eax+14)
Figure 6 — Using !strct to Reveal the Format of the EPROCESS Structure
Note offset 0x104 – the ObjectTable. Looking back at the code disassembly, we note that after loading this value into memory it is tested to ensure that it is not a NULL pointer:
8015C921 85C0 test eax,eax
8015C923 740C je _ExpCopyProcessInfo@8+1Dh
Since we are executing the instruction following the “je” we know that the test succeeded and we have a non-NULL value. Let us compare this result with the current contents of the data in memory. We accomplish this by dumping the contents of the EPROCESS structure using kdex2x86 (See Figure 7).
0: kd> !strct eprocess B980Ae08
Structure EPROCESS (Size:0x1f8) at 0xb980ae08:
+0000 Pcb(KPROCESS struct)
+0000 Header(DISPATCHER_HEADER struct)
+0010 ProfileListHead(LIST_ENTRY struct)
+0018 DirectoryTableBase = 08c6f000 21570000
+0020 LdtDescriptor(KGDTENTRY struct)
+0028 Int21Descriptor(KIDTENTRY struct)
+0030 IopmOffset = 20ad
+0032 Iopl = 00
+0033 VdmFlag = 00
+0034 ActiveProcessors = 00000001
+0038 KernelTime = 00000001
+003c UserTime = 00000001
+0040 ReadyListHead(LIST_ENTRY struct)
+0048 SwapListEntry(LIST_ENTRY struct)
+0050 ThreadListHead(LIST_ENTRY struct)
+0058 ProcessLock = 00000000
+005c Affinity = 0000000f
+0060 StackCount = 0001
+0062 BasePriority = 08
+0063 ThreadQuantum = 24
+0064 AutoAlignment = 00
+0065 State = 00
+0066 ThreadSeed = 54
+0067 DisableBoost = 00
+0068 ExitStatus(NTSTATUS) = 0(STATUS_SUCCESS)
+006c LockEvent(KEVENT struct)
+006c Header(DISPATCHER_HEADER struct)
+007c LockCount = 00000000
+0080 CreateTime(LARGE_INTEGER/ULARGE_INTEGER union) = following
+0080 None(Anonymous struct) = following
+0088 ExitTime(LARGE_INTEGER/ULARGE_INTEGER union) = following
+0088 None(Anonymous struct) = following
+0090 LockOwner = BF6B6DC0 (-> PKTHREAD)
+0094 UniqueProcessId = 00000120 (-> HANDLE)
+0098 ActiveProcessLinks(LIST_ENTRY struct)
+0098 Flink = BF2E4EA0 (-> PLIST_ENTRY)
+009c Blink = B2EC4EA0 (-> PLIST_ENTRY)
+00a0 QuotaPeakPoolUsage = 00000460 00002938
+00a8 QuotaPoolUsage = 00000340 00000e81
+00b0 PagefileUsage = 00000024
+00b4 CommitCharge = 00000024
+00b8 PeakPagefileUsage = 0000003a
+00bc PeakVirtualSize = 00905000
+00c0 VirtualSize = 004e5000
+00c8 Vm(MMSUPPORT struct)
+00c8 LastTrimTime(LARGE_INTEGER/ULARGE_INTEGER union) = following
+00d0 LastTrimFaultCount = 000000a2
+00d4 PageFaultCount = 000000a4
+00d8 PeakWorkingSetSize = 000000a7
+00dc WorkingSetSize = 00000091
+00e0 MinimumWorkingSetSize = 00000032
+00e4 MaximumWorkingSetSize = 00000159
+00e8 VmWorkingSetList = C0502000 (-> PMMWSL)
+00ec WorkingSetExpansionLinks(LIST_ENTRY struct)
+00f4 AllowWorkingSetAdjustment = 01
+00f5 AddressSpaceBeingDeleted = 00
+00f6 ForegroundSwitchCount = 00
+00f7 MemoryPriority = 00
+00f8 LastProtoPteFault = 00000000
+00fc DebugPort = 00000000
+0100 ExceptionPort = b3030f68
+0104 ObjectTable = 00000000 (-> PHANDLE_TABLE)
+0108 Token = B0834EB0 (-> PACCESS_TOKEN)
+010c WorkingSetLock(FAST_MUTEX struct)
+010c Count = 00000001
+0110 Owner = 00000000 (-> PKTHREAD)
+0114 Contention = 00000000
+0118 Event(KEVENT struct)
+0128 OldIrql = 0000003d
+012c WorkingSetPage = 0002ec71
+0130 ProcessOutswapEnabled = 00
+0131 ProcessOutswapped = 00
+0132 AddressSpaceInitialized = 01
+0133 AddressSpaceDeleted = 00
+0134 AddressCreationLock(FAST_MUTEX struct)
+0134 Count = 00000001
+0138 Owner = 00000000 (-> PKTHREAD)
+013c Contention = 00000000
+0140 Event(KEVENT struct)
+0150 OldIrql = 00000000
+0154 HyperSpaceLock = 00000000
+0158 ForkInProgress = 00000000 (-> PETHREAD)
+015c VmOperation = 0000
+015e ForkWasSuccessful = 00
+015f MmAgressiveWsTrimMask = 00
+0160 VmOperationEvent = 00000000 (-> PKEVENT)
+0164 PageDirectoryPte(HARDWARE_PTE struct)
+0168 LastFaultCount = 00000000
+016c ModifiedPageCount = 00000000
+0170 VadRoot = a856cfc8
+0174 VadHint = a856cfc8
+0178 CloneRoot = 00000000
+017c NumberOfPrivatePages = 0000001e
+0180 NumberOfLockedPages = 00000000
+0184 NextPageColor = 5d24
+0186 ExitProcessCalled = 01
+0187 CreateProcessReported = 00
+0188 SectionHandle = 00000004 (-> HANDLE)
+018c Peb = 7FFDF000 (-> PPEB)
+0190 SectionBaseAddress = 01000000
+0194 QuotaBlock = BDCEEFD0 (-> PEPROCESS_QUOTA_BLOCK)
+0198 LastThreadExitStatus(NTSTATUS) = 0(STATUS_SUCCESS)
+019c WorkingSetWatch = 00000000 (-> PPAGEFAULT_HISTORY)
+01a0 Win32WindowStation = 00000000 (-> HANDLE)
+01a4 InheritedFromUniqueProcessId = 0000007C (-> HANDLE)
+01a8 GrantedAccess(ACCESS_MASK) = 1f0fff( STANDARD_RIGHTS_ALL )
+01ac DefaultHardErrorProcessing = 00008000
+01b0 LdtInformation = 00000000
+01b4 VadFreeHint = ba2cafc8
+01b8 VdmObjects = 00000000
+01bc ProcessMutant(KMUTANT struct)
+01bc Header(DISPATCHER_HEADER struct)
+01cc MutantListEntry(LIST_ENTRY struct)
+01d4 OwnerThread = 00000000 (-> PKTHREAD)
+01d8 Abandoned = 00
+01d9 ApcDisable = 00
+01dc ImageFileName = cgiapp.exe...... 63 67 69 61 70 70 2e 65 78 6
5 00 00 00 00 00 00
+01ec VmTrimFaultValue = 00000000
+01f0 SetTimerResolution = 00
+01f1 PriorityClass = 02
+01f2 SubSystemMinorVersion = 00
+01f3 SubSystemMajorVersion = 04
+01f2 SubSystemVersion = 0400
+01f4 Win32Process = 00000000
Figure 7 — Compare With Data in Memory
*Note the value at offset 0x104 – it is NULL!
We terminated our analysis at this stage, believing that it was likely we had found a multiprocessor race condition within Windows NT. Specifically, the object handle table had been deleted and deallocated at the same time a separate thread was attempting to dereference it. We concluded this was sufficient analysis to report to Microsoft and that further study on our part would be inconclusive.
After class, one of the students who had access to the relevant NT source code advised us that the field being accessed was a count field in the object handle table. He was unable to ascertain why the access to the field became invalid during use, but he confirmed our analysis.
This type of system level damage is common to MP race condition problems – where the problem occurs only under specific loads (such as running a new set of HCT tests with what may have been slightly different behavior characteristics) and ultimately lead not to direct analysis that demonstrates the problem but a system state that is inconsistent (such as this).
We do not know if Microsoft has accepted this problem as a legitimate bug or if it is resolved in subsequent versions of Windows NT or Windows 2000. Perhaps one of our loyal readers has more information?